协作过滤(CF)是推荐系统中广泛搜索的问题。线性自动编码器是CF的一种完善的方法,它通过编码用户项目交互来估计项目项目关系。尽管线性自动编码器的性能出色,但由于项目数量不断增长而导致的计算和存储成本迅速增加,限制了它们在大规模的现实情况下的可及性。最近,基于图的方法在具有高扩展性的CF上取得了成功,并已证明在用户项目交互模型中具有线性自动编码器的共同点。在此激励的情况下,我们提出了通过Item-Item图分区(ERGP)提出的有效且可扩展的建议,旨在解决线性自动编码器的局限性。特别是,提出了递归图形分区策略,以确保将项目集分为有限大小的几个分区。线性自动编码器在分区中编码用户项目交互,同时保留整个项目集中的全局信息。这允许ERGP保证项目数量增加时具有高效率和高可扩展性。在3个公共数据集和3个开放基准数据集上进行的实验证明了ERGP的有效性,ERGP的效率优于较低的培训时间和存储成本的最先进模型。
translated by 谷歌翻译
Reliable and efficient validation technologies are critical for the recent development of multi-vehicle cooperation and vehicle-road-cloud integration. In this paper, we introduce our miniature experimental platform, Mixed Cloud Control Testbed (MCCT), developed based on a new notion of Mixed Digital Twin (mixedDT). Combining Mixed Reality with Digital Twin, mixedDT integrates the virtual and physical spaces into a mixed one, where physical entities coexist and interact with virtual entities via their digital counterparts. Under the framework of mixedDT, MCCT contains three major experimental platforms in the physical, virtual and mixed spaces respectively, and provides a unified access for various human-machine interfaces and external devices such as driving simulators. A cloud unit, where the mixed experimental platform is deployed, is responsible for fusing multi-platform information and assigning control instructions, contributing to synchronous operation and real-time cross-platform interaction. Particularly, MCCT allows for multi-vehicle coordination composed of different multi-source vehicles (\eg, physical vehicles, virtual vehicles and human-driven vehicles). Validations on vehicle platooning demonstrate the flexibility and scalability of MCCT.
translated by 谷歌翻译
整个腹部器官分割起着腹部损伤诊断,放射治疗计划的重要作用,并随访。然而,划定肿瘤学家所有腹部器官手工费时且非常昂贵的。近日,深学习型医学图像分割显示,以减少人工划定努力的潜力,但它仍然需要培训的大型精细注释的数据集。虽然在这个任务很多努力,但仍然覆盖整个腹部区域与整个腹腔脏器分割准确和详细的注解几个大的图像数据集。在这项工作中,我们建立了一个大型的\ textit【W】孔腹部\ textit {} OR甘斯\ textit {d} ataset(\ {textit WORD})的算法研究和临床应用的发展。此数据集包含150个腹部CT体积(30495片),并且每个卷具有16个机关用细像素级注释和涂鸦基于稀疏注释,这可能是与整个腹部器官注释最大数据集。状态的最先进的几个分割方法是在该数据集进行评估。而且,我们还邀请了临床肿瘤学家修改模型预测测量深度学习方法和真实的肿瘤学家之间的差距。我们进一步介绍和评价这一数据集一个新的基于涂鸦,弱监督分割。该工作腹部多器官分割任务提供了新的基准,这些实验可以作为基准对未来的研究和临床应用的发展。 https://github.com/HiLab-git/WORD:代码库和数据集将被释放
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.
translated by 谷歌翻译
In this paper, we investigate the joint device activity and data detection in massive machine-type communications (mMTC) with a one-phase non-coherent scheme, where data bits are embedded in the pilot sequences and the base station simultaneously detects active devices and their embedded data bits without explicit channel estimation. Due to the correlated sparsity pattern introduced by the non-coherent transmission scheme, the traditional approximate message passing (AMP) algorithm cannot achieve satisfactory performance. Therefore, we propose a deep learning (DL) modified AMP network (DL-mAMPnet) that enhances the detection performance by effectively exploiting the pilot activity correlation. The DL-mAMPnet is constructed by unfolding the AMP algorithm into a feedforward neural network, which combines the principled mathematical model of the AMP algorithm with the powerful learning capability, thereby benefiting from the advantages of both techniques. Trainable parameters are introduced in the DL-mAMPnet to approximate the correlated sparsity pattern and the large-scale fading coefficient. Moreover, a refinement module is designed to further advance the performance by utilizing the spatial feature caused by the correlated sparsity pattern. Simulation results demonstrate that the proposed DL-mAMPnet can significantly outperform traditional algorithms in terms of the symbol error rate performance.
translated by 谷歌翻译